Decomposing the load-store queue by function for power reduction and scalability

نویسندگان

  • Lee Baugh
  • Craig B. Zilles
چکیده

Because they are based on large content-addressable memories, load-store queues (LSQ) present implementation challenges in superscalar processors, especially as issue width and number of in-flight instructions are scaled. In this paper, we propose an alternate organization of an LSQ that separates the forwarding functionality from checking that loads received their correct values. Two main techniques are exploited: 1) the store forwarding logic is only accessed by those loads and stores that are likely to be involved in forwarding, and 2) the checking structure is banked by address. The result of these techniques is that a small collection of small, low bandwidth structures can be substituted for the large, high bandwidth structures used in conventional designs. By our calculations, these proposed techniques reduce LSQ dynamic power by a factor of 3-5 while achieving equivalent performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Power-Efficient and Scalable Load-Store Queue Design

The load-store queue (LQ-SQ) of modern superscalar processors is responsible for keeping the order of memory operations. As the performance gap between processing speed and memory access becomes worse, the capacity requirements for the LQ-SQ increase, and its design becomes a challenge due to its CAM structure. In this paper we propose an efficient load-store queue state filtering mechanism tha...

متن کامل

Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load/Store Optimization

A high-bandwidth, low-latency load-store unit is a critical component of a dynamically scheduled processor. Unfortunately, it is also one of the most complex and non-scalable components. Recently, several researchers have proposed techniques that simplify the core load-store unit and improve its scalability in exchange for the in-order pre-retirement re-execution of some subset of the loads in ...

متن کامل

Scaling Load-Store Queue

In order to tolerate long latency instructions, large load and store queue is necessary to bypass in flight information to dependent instruction; but as the latency goes up, the size of the load and store queue will increase as well, which will impact cycle time, area and power. Hierarchical designs in [2] and [10] was proposed to alleviate cycle time problem, but the CAM and search functions r...

متن کامل

A High-Bandwidth Load-Store Unit for Single- and Multi-Threaded Processors

Abstract A store queue (SQ) is a critical component of the load execution machinery. High ILP processors require high load execution bandwidth, but providing high bandwidth SQ access is difficult. Address banking, which works well for caches, conflicts with age-ordering which is required for the SQ and multi-porting exacerbates the latency of the associative searches that load execution require...

متن کامل

Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services

The prevalence of dynamic-content web services, exemplified by search and online social networking, has motivated an increasingly wide web-facing front end. Horizontal scaling in the Cloud is favored for its elasticity, and distributed design of load balancers is highly desirable. Existing algorithms with a centralized design, such as Join-the-Shortest-Queue (JSQ), incur high communication over...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IBM Journal of Research and Development

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2006